The past few years have seen rapid progress in combining reinforcement learning (RL) with deep learning. Various breakthroughs ranging from games to robotics have spurred the interest in designing sophisticated RL algorithms and systems. However, the prevailing workflow in RL is to learn tabula rasa, which may incur computational inefficiency. This precludes continuous deployment of RL algorithms and potentially excludes researchers without large-scale computing resources. In many other areas of machine learning, the pretraining paradigm has shown to be effective in acquiring transferable knowledge, which can be utilized for a variety of downstream tasks. Recently, we saw a surge of interest in Pretraining for Deep RL with promising results. However, much of the research has been based on different experimental settings. Due to the nature of RL, pretraining in this field is faced with unique challenges and hence requires new design principles. In this survey, we seek to systematically review existing works in pretraining for deep reinforcement learning, provide a taxonomy of these methods, discuss each sub-field, and bring attention to open problems and future directions.
translated by 谷歌翻译
我们研究了从连续动作空间到离散动作空间的软参与者批评(SAC)的适应性。我们重新访问香草囊,并在应用于离散设置时对其Q值低估和性能不稳定性问题提供深入的了解。因此,我们建议使用Q-CLIP的熵 - 平均Q学习和双平均Q学习来解决这些问题。对具有离散动作空间(包括Atari游戏和大型MOBA游戏)的典型基准测试的广泛实验显示了我们提出的方法的功效。我们的代码在:https://github.com/coldsummerday/revisiting-discrete-sac。
translated by 谷歌翻译
人工智能通过许多令人印象深刻的应用深刻地彻底改变了药物化学领域,但是这些应用的成功需要大量具有高质量注释的培训样本,这严重限制了数据驱动方法的广泛使用。在本文中,我们专注于反应产量预测问题,该问题有助于化学家仅通过一些实验试验选择新的化学空间中的高收益反应。为了攻击这一挑战,我们首先提出了Metarf,这是一种基于注意力的随机森林模型,该模型专门针对少量产量预测,其中随机森林的注意力重量通过元学习框架自动优化,可以快速地进行优化适合预测新试剂的性能,同时还提供了一些其他样品。为了提高少量学习绩效,我们进一步引入了基于尺寸的采样方法,以确定要进行实验测试然后学习的有价值的样品。我们的方法在三个不同的数据集上进行了评估,并在几乎没有预测上获得了令人满意的性能。在高通量实验(HTE)数据集中,我们方法论的前10个高收益反应的平均产量相对接近理想的产量选择结果。
translated by 谷歌翻译
痤疮检测对于解释性诊断和对皮肤疾病的精确治疗至关重要。任意边界和痤疮病变的尺寸较小,导致在两阶段检测中大量质量较差的建议。在本文中,我们提出了一个针对地区建议网络的新型头部结构,以两种方式提高建议的质量。首先,提出了一个空间意识的双头(SADH)结构,以从两个不同的空间角度从分类和本地化进行分类和本地化的表示。拟议的SADH确保了更陡峭的分类信心梯度,并抑制了与匹配的地面真理相交(IOU)低相交(IOU)的建议。然后,我们提出了一个归一化的Wasserstein距离预测分支,以改善提议分类评分与IOU之间的相关性。此外,为了促进痤疮检测的进一步研究,我们构建了一个名为Acnescu的新数据集,具有高分辨率成像,精确的注释和细粒度的病变类别。对AcnesCU和公共数据集Acne04进行了广泛的实验,结果表明该方法可以提高建议的质量,始终超过最先进的方法。代码和收集的数据集可在https://github.com/pingguokiller/acnedetection中找到。
translated by 谷歌翻译
由于部分可观察性,高维视觉感知和延迟奖励,在MINECRAFT等开放世界游戏中的学习理性行为仍然是挑战,以便对加固学习(RL)研究造成挑战性,高维视觉感知和延迟奖励。为了解决这个问题,我们提出了一种具有代表学习和模仿学习的样本有效的等级RL方法,以应对感知和探索。具体来说,我们的方法包括两个层次结构,其中高级控制器学习控制策略来控制选项,低级工作人员学会解决每个子任务。为了提高子任务的学习,我们提出了一种技术组合,包括1)动作感知表示学习,其捕获了行动和表示之间的基础关系,2)基于鉴别者的自模仿学习,以实现有效的探索,以及3)合奏行为克隆一致性筛选政策鲁棒性。广泛的实验表明,Juewu-MC通过大边缘显着提高了样品效率并优于一组基线。值得注意的是,我们赢得了神经脂溢斯矿业锦标赛2021年研究竞赛的冠军,并实现了最高的绩效评分。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译